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LOW-RANK SPECTRAL OPTIMIZATION VIA GAUGE DUALITY 

MICHAEL P. FRIEDLANDER* AND IVES MACEDO* 


Abstract. Various applications in signal processing and machine learning give rise to highly 
structured spectral optimization problems characterized by low-rank solutions. Two important 
examples that motivate this work are optimization problems from phase retrieval and from blind 
deconvolution, which are designed to yield rank-1 solutions. An algorithm is described that is based 
on solving a certain constrained eigenvalue optimization problem that corresponds to the gauge dual 
which, unlike the more typical Lagrange dual, has an especially simple constraint. The dominant 
cost at each iteration is the computation of rightmost eigenpairs of a Hermitian operator. A range of 
numerical examples illustrate the scalability of the approach. 
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low-rank solutions, phase retrieval 
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1. Introduction. There are a number of applications in signal processing and 
machine learning that give rise to highly structured spectral optimization problems. We 
are particularly interested in the class of problems characterized by having solutions that 
are very low rank, and by involving linear operators that are best treated by matrix-free 
approaches. This class of problems is sufficiently restrictive that it allows us to design 
specialized algorithms that scale well and lend themselves to practical applications, 
but it is still sufficiently rich to include interesting problems. Two examples include 
the nuclear-norm minimization for problems such as blind deconvolution (Ahmed, 
Recht, and Romberg, 2014), and the PhaseLift formulation of the celebrated phase 
retrieval problem (Candes, Strohmer, and Voroninski, 2012). The problems can be cast 
generically in the semi-definite programming (SDP) framework, for which a variety 
of algorithms are available. However, typical applications can give rise to enormous 
optimization problems that challenge the very best workhorse algorithms. 

Denote the set of complex-valued nx n Hermitian matrices by T~L n . The algorithm 
that we propose is designed to solve the problems 


subject to ||6 — AX11 < e, X y 0, (1.1a) 


minimize trace X 
xeu n 



subject to ||6 — AX11 < e 


(1.1b) 


minimize 

xec"> x ” 2 


where the parameter e controls the admissible deviations between the linear model 
AX and the vector of observations b. (The particular properties of the vectors b and of 
the linear operators A are detailed in section 1.2.) Our approach for both problems is 
based on first solving a related Hermitian eigenvalue optimization problem over a very 
simple constraint, and then using that solution to recover a solution of the original 
problem. This eigenvalue problem is highly structured, and because the constraint is 
easily handled, we are free to apply a projected first-order method with inexpensive 
per-iteration costs that scales well to very large problems. 
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The key to the approach is to recognize that the problems (1.1) are members of 
the family of gauge optimization problems, which admit a duality concept different 
from the Lagrange duality prevalent in convex optimization. Gauges are nonnegative, 
positively homogeneous convex functions that vanish at the origin. They significantly 
generalize the familiar notion of a norm, which is a symmetric gauge function. The 
class of gauge optimization problems, as defined by Freund’s seminal 1987 work, can be 
stated simply: find the element of a convex set that is minimal with respect to a gauge. 
These conceptually simple problems appear in a remarkable array of applications, and 
include an important cross-section of convex optimization. For example, all of conic 
optimization can be phrased within the class of gauge optimization; see Friedlander, 
Macedo, and Pong (2014, Example 1.3), and section 2 below. 

Problem (1.1a) is not explicitly stated as a gauge problem because the objective 
is not nonnegative everywhere on its domain, as required in order for it to be a gauge 
function. It is, however, nonnegative on the feasible set, and the problem can easily 
be cast in the gauge framework simply by changing the objective function to 

traceX + 5{X \ ■ + 0), where S(X \ ■ + 0) = < ^ if X _ 0, ( 1 . 2 ) 

l+oo otherwise. 

This substitution yields an equivalent problem, and the resulting convex function 
is nonnegative and positively homogeneous—and therefore a gauge function. More 
generally, it is evident that any function of the form 7 + £(• | K.) is a gauge, in which 7 
is a gauge and S(- \ K.) is the indicator of a convex cone K. 

The method that we develop applies to the much broader class of semidefinite 
optimization problems with nonnegative objective values, as described in section 6 . We 
pay special attention to the low-rank spectral problems just mentioned because they 
have a special structure that can be exploited both theoretically and computationally. 

1.1. Notation. To emphasize the role of the vector of singular values cr(X), 
we adopt the Schatten p-norm notation for the matrix-norms referenced in this pa¬ 
per, i.e., ||X|| p := ||cr(X)|| p . Thus, the nuclear, Frobenius, and spectral norms of 
a matrix A' are denoted by HX)^, ||X|| 2 , and || Al ||^, respectively. The notation 
for complex-valued quantities, particularly in the SDP context, is not entirely stan¬ 
dard. Here we define some objects we use frequently. Define the complex inner 
product (X,Y) := trace XT*, where Y* is the conjugate transpose of a complex 
matrix Y , i.e., Y* = Y T . The set ofnxn Hermitian matrices is denoted by 'H n , 
and X + 0 (resp., X + 0) indicates that the matrix X is both Hermitian and 
positive semidefinite (resp., definite). Let A(A) be the vector of ordered eigenval¬ 
ues of A £ T~L n , i.e., A X (A) > A 2 (A) > ••• > A n (A). (An analogous ordering is 
assumed for the vector of singular values.) For B + 0, let A(A, B) denote the vec¬ 
tor of generalized eigenvalues of the pencil (A, B), i.e., X(A,B) = \(B~^ 2 AB~^ 2 ). 
Let IH(-) and 3(-) denote the real and imaginary parts of their arguments. The norm 
dual to || • || : C m —X M + is defined by 

||a;||* := sup 9\(z,x). (1.3) 

1MI<1 

The positive part of a scalar is denoted by [•]+ = max{0, •}. 

When we make reference to one- and two-dimensional signals, our intent is to dif¬ 
ferentiate between problems that involve discretized functions of one and two variables, 
respectively, rather than to describe the dimension of the ambient space. Hence the 
terms two-dimensional signals and two-dimensional images are used interchangeably. 
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Generally we assume that the problems are feasible, although we highlight in 
section 2 how to detect infeasible problems. We also assume that 0 < e < || 6 ||, which 
ensures that the origin is not a trivial solution. In practice, the choice of the norm 
that defines the feasible set will greatly influence the computational difficulty of the 
problem. Our implementation is based on the 2-norm, which often appears in many 
practical applications. Our theoretical developments, however, allow for any norm. 

1.2. Problem formulations. Below we describe two applications, in phase 
retrieval and blind deconvolution, that motivate our work. There are other relevant 
examples, such as matrix completion (Recht, Fazel, and Parrilo, 2010), but these two 
applications require optimization problems that exemplify properties we exploit in our 
approach. 

1.2.1. Phase retrieval. The phase retrieval problem is concerned with recovery 
of the phase information of a signal—e.g., an image—from magnitude-only measure¬ 
ments. One important application is X-ray crystallography, which generates images of 
the molecular structure of crystals (Harrison, 1993). Other applications are described 
by Candes et al. (2012) and Waldspurger, d’Aspremont, and Mallat (2015). They 
describe the following recovery approach, based on convex optimization. 

Magnitude-only measurements of the signal x £ C" can be described as quadratic 
measurements of the form 


b k = l(z,a fc )| 2 

for some vectors a k that encode the waveforms used to illuminate the signal. These 
quadratic measurements of x can be understood as linear measurements 

b k = ( xx*,a k a k ) = ( X,A k ) 

of the lifted signal X := xx *, where A k := a k a k is the fcth lifted rank -1 measurement 
matrix. 

In the matrix space, the trace of the unknown lifted signal X acts as a surrogate for 
the rank function. This is analogous to the 1-norm, which stands as a convex surrogate 
for counting the number of nonzeros in a vector. This leads us to an optimization 
problem of the form (1.1a), where A : H n -A R m is defined by (AX) k := (X,A k ). The 
parameter e anticipates noise in the measurements. Candes et al. (2012) call this the 
PhaseLift formulation. In section 5.1 we give numerical examples for recovering one- 
and two-dimensional signals with and without noise. 

1.2.2. Biconvex compressed sensing and blind deconvolution. The bicon¬ 
vex compressed sensing problem (Ling and Strohmer, 2015) aims to recover two signals 
from a number of sesquilinear measurements of the form 

^k (^T, di k ) (x 2) ^2k) ? 

where x 1 £ C™ 1 and x 2 £ C™ 2 . In the context of blind deconvolution, aq and x 2 
correspond to coefficients of the signals in some bases. The lifting approached used in 
the phase retrieval formulation can again be used, and the measurements of x 1 and x 2 
can be understood as coming from linear measurements 

b k = {x^a^alk) = (X,A k ) 

of the lifted signal X = aqaq, where A k := a lk a 2k is the lifted asymmetric rank-1 
measurement matrix. Ahmed et al. (2014) study conditions on the structure and the 
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number of measurements that guarantee that the original vectors (up to a phase) 
may be recovered by minimizing the sum of singular values of X subject to the linear 
measurements. This leads to an optimization problem of the form (1.1b), where 
A : C niXn2 -A C m is defined by (AX) k := (X,A k ). 

In section 5.2, we describe a two-dimensional blind deconvolution application 
from motion deblurring, and there we provide further details on the structure of the 
measurement operators A k: and report on numerical experiments. 

1.3. Reduction to Hermitian SDP. It is convenient, for both our theoretical 
and algorithmic development, to embed the nuclear-norm minimization problem (1.1b) 
within the symmetric SDP (1.1a). The resulting theory is no less general, and it permits 
us to solve both problems with what is essentially a single software implementation. 
The reduction to the Hermitian trace-minimization problem (1.1a) takes the form 


minimize 

.Yec ” lX ” 2 

r x ,r 2 GR m 

subject to 


I 

0\ 

u 

X \ 

u 

I)' 

, \ x * 

V 


0 

A* k 

0 

-A* k 


A k 

0 

A k 

0 


U 

X* 

u 

X* 


+ ir 2 \\ < e, 


X 

V 

X 

V 


u 

X* 


+ r lk — *Rb k , 

+ r 2k = 

h. 0, k= 1, ...,m. 


(1.4) 


The residual variables rq, r 2 £ R m merely serve to allow the compact presentation 
above, as they can be eliminated using the equality constraints. The additional variables 
U and V, at the minimizer, correspond to (XX*) 1 / 2 and (X*A') 1 / 2 , respectively; the 
variable X retains its original meaning. This reduction is based on a well-known 
reformulation of the nuclear norm as the optimal value of an SDP; see Fazel (2002, 
Lemma 2) or Recht et al. (2010, Proof of Proposition 2.1). 

Although this reduction is convenient for our presentation, it is not strictly 
necessary, as will be clear from the results in section 6. In fact, the resulting increase 
in problem size might affect a solver’s performance, as would likely be noticeable if 
dense solvers are employed. Our focus, however, is on large problems that require 
matrix-free operators and exhibit low-rank solutions. Throughout this paper, we focus 
entirely on the SDP formulation (1.1a) without loss of generality. 

1.4. Approach. Our strategy for these low-rank spectral optimization problems 
is based on solving the constrained eigenvalue optimization problem 

minimize A i(A* y) subject to (6, y) — e||?/||* > 1 (1.5) 


that results from applying gauge duality (Friedlander et al., 2014; Freund, 1987) to 
a suitable reformulation of (1.1a). This is outlined in section 2. The dimension of 
the variable y in the eigenvalue optimization problem corresponds to the number of 
measurements. In the context of phase retrieval and blind deconvolution, Candes and 
Li (2014) and Ahmed et al. (2014) show that the number of measurements needed to 
recover with high probability the underlying signals is within a logarithmic factor of 
the signal length. The crucial implication is that the dimension of the dual problem 
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grows slowly as compared to the dimension of the primal problem, which grows as the 
square of the signal length. 

In our implementation, we apply a simple first-order projected subgradient method 
to solve the eigenvalue problem. The dominant cost at each iteration of our algorithm 
is the computation of rightmost eigenpairs of the n x n Hermitian linear operator 
A*y, which are used to construct descent directions for (1.5). The structure of the 
measurement operators allows us to use Krylov-based eigensolvers, such as ARPACK 
(Lehoucq, Sorensen, and Yang, 1998), for obtaining these leading eigenpairs. Primal 
solution estimates A' are recovered via a relatively small constrained least-squares 
problem, described in section 4. 

An analogous approach based on the classical Lagrangian duality also leads to a 
dual optimization problem in the same space as our dual eigenvalue problem: 

maximize (b, y) — e||y||* subject to A*y^I. (1-6) 

yem™ 

Note that the Lagrange dual possesses a rather simple objective and a difficult linear 
matrix inequality of order n as a constraint. Precisely the reverse situation holds for 
the gauge dual (1.5), which has a relatively simple constraint. 

It is well known that SDPs with a constant-trace property—i.e., AX = b implies 
trace(A) is constant- have Lagrange dual problems that can be formulated as uncon¬ 
strained eigenvalue problems. This approach is used by Helmberg and Rendl (2000) 
to develop a spectral bundle method. The applications that we consider, however, do 
not necessarily have this property. 

1.5. Reproducible research. The data files and Matlab scripts used to gen¬ 
erate the numerical results presented in section 5 can be obtained at the following 
URL: 

http://www.cs.ubc.ca/~mpf/low-rank-opt 

1.6. Related work. Other researchers have recognized the need for algorithms 
with low per-iteration costs that scale well for large-scale, low-rank spectral optimization 
problems. Notable efforts include Hazan (2008), Laue (2012), and Freund, Grigas, and 
Mazumder (2015), who advocate variations of the Frank-Wolfe (FW) method to solve 
some version of the problem 

minimize /(AT) subject to trace(AT) < r, X y 0, (1-7) 

where / is a differentiable function. For example, the choice /(AT) = i||AA — 6||| yields 
a problem related to (1.1a). The asymmetric version of the problem with ||Aj|i < r 
is easily accommodated by simply replacing the above constraints. For simplicity, 
here we focus on the symmetric case, though our approach applies equally to the 
asymmetric case. The main benefit of using FW for this problem is that each iteration 
requires only a rightmost eigenvalue of the gradient V/(A), and therefore has the 
same per-iteration cost of the method that we consider, which requires a rightmost 
eigenvalue of the same-sized matrix A* y. The same Krylov-based eigensolvers apply 
in both cases. 

There are at least two issues that need to be addressed when comparing the FW 
algorithm to the approach we take here. First, as Freund et al. (2015) make clear, 
even in cases where low-rank solutions are expected, it is not possible to anticipate the 
rank of early iterates X k generated by the FW method. In particular, they observe 
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that the rank of X k quickly increases during early iterations, and only slowly starts to 
decrease as the solution is approached. This motivates their development of algorithmic 
devices that attenuate rank growth in intermediate iterates. Any implementation, 
however, must be prepared to increase storage for the factors of X k during intermediate 
iterates. In contrast, a subgradient method applied to the gauge dual problem can 
be implemented with constant storage. Second, although in principle there exists a 
parameter r that causes the optimization problems (1.7) and (1.1a) to share the same 
solution, this parameter is not generally known in advance. One way around this is 
to solve a sequence of problems (1.7) for varying parameters r k using, for example, a 
level-set procedure described by Aravkin, Burke, Drusvyatskiy, Friedlander, and Roy 
(2016). 

As an alternative to applying the FW algorithm to (1.7), we might instead consider 
applying a variation of the FW method directly to the gauge dual problem (1.5). 
Because the gauge dual objective is not differentiable and the feasible set is not 
compact if e = 0, some modification to the standard FW method is required. Argyriou, 
Signoretto, and Suykens (2014), Bach (2015), and Nesterov (2015) propose variations 
of FW that involve smoothing the objective. These smoothing approaches are typically 
based on infimal convolution with a smooth kernel, which may lead to a function whose 
gradient is expensive to compute. For example, the “soft-max” smooth approximation 
of A x (-) is the function /MogJT =1 n ex p(A^ (-) / /u). Forming the gradient of this 
smooth function requires computing all eigenvalues of an n-by-n Hermitian matrix. 

Laue (2012) proposes a hybrid algorithm that interleaves a nonconvex subproblem 
within the FW iterations. If the local minimum of the nonconvex subproblem improves 
the objective value, it is used to replace the current FW iterate. This approach is 
similar in spirit to the primal-dual refinement that we describe in section 4.3, but 
because Laue’s method is entirely primal, it has the benefit of not requiring a procedure 
to feed the improved primal sequence back to the dual sequence. 

2. Spectral gauge optimization and duality. The derivation of the eigenvalue 
optimization problem (1.5) as a dual to (1.1a) follows from a more general theory of 
duality for gauge optimization. Here we provide some minimal background for our 
derivations related to spectral optimization; see Freund (1987) and Friedlander et al. 
(2014) for fuller descriptions. We begin with a general description of the problem class. 

Let k : X i-a 1 U {+oc} and p : y i-A R U {+oc} be gauge functions, where 
A : X i—> y is a linear operator that maps between the finite-dimensional real inner- 
product spaces X and y. The polar 

f°(y) ■= inf { p > 0 | (x , y) < pf{x) Vx } 
of a gauge / plays a key role in the duality of gauge problems. The problems 

minimize k(x) subject to p{b — Ax) < e, (2.1a) 

x^X 

minimize n°(A*y) subject to (b,y) — ep°{y) > 1, (2-lb) 

yey 

are dual to each other in the following sense: all primal-dual feasible pairs (x, y) satisfy 
the weak-duality relationship 

1 < K(x)K°(A*y). (2.2) 

Moreover, a primal-dual feasible pair is optimal if this holds with equality. This 
strong-duality relationship provides a certificate of optimality. 
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The SDP problem (1.1a) can be cast into the mold of the canonical gauge formu¬ 
lation ( 2 . 1 a) by using the redefined objective ( 1 . 2 ) and making the identifications 

k(X) = trace A + S(X \ ■ y 0) and p(r) = ||r||. 

We use the polar calculus described by Friedlander et al. (2014, Section 7.2.1) together 
with the definition of the dual norm to obtain the correponding polar functions: 

k °(F) = [Ai(F)] + and p°(y) = ||y||*. 

It then follows from (2.1) that the following are a dual gauge pair: 

minimize traceX + 6(X \ ■ y 0) subject to ||6 — AXj| < e, (2.3a) 
xeH n 

minimize [Aj,(^4.*y)]_t_ subject to ( 6 ,y) — e||y||* > 1. (2.3b) 

The derivation of gauge dual problems relies on the polarity operation applied to 
gauges. When applied to a norm, for example, the polar is simply the dual norm. In 
contrast, Lagrange duality is intimately tied to conjugacy, which is what gives rise 
to the dual problem (1.6). Of course, the two operations are closely related. For any 
guage function k, for example, K*(y) = <5 K °(.)<i(y). These relationships are described 
in detail by Rockafellar (1970, Section 15) and Friedlander et al. (2014, Section 2.3). 

We can simplify the dual objective and safely eliminate the positive-part operator: 
because k(X) is necessarily strictly positive for all nonzero X, and is additionally 
finite over the feasible set of the original problem ( 1 . 1 a), it follows from ( 2 . 2 ) that 
k° (A* y) is positive for all dual feasible points. In other words, 

0<[\ 1 (A*y)] + = \ 1 (A*y) (2.4) 

for all dual feasible points y. Hence we obtain the equivalent dual problem (1.5). 

In practice, we need to be prepared to detect whether the primal problem (2.3a) 
is infeasible. The failure of condition (2.4) in fact furnishes a certificate of infeasibility 
for ( 1 . 1 a): if X(A* y) = 0 for some dual-feasible vector y, it follows from ( 2 . 2 ) that 
k(X) is necessarily infinite over the feasible set of (2.3a)— i.e., X ^ 0 for all X feasible 
for (2.3a). Thus, (1.1a) is infeasible. 

3. Derivation of the approach. There are two key theoretical pieces needed for 
our approach. The first is the derivation of the eigenvalue optimization problem (1.5), 
as shown in section 2. The second piece is the derivation of a subproblem that allows 
recovery of a primal solution A' from a solution of the eigenvalue problem (1.5). 

3.1. Recovering a primal solution. Our derivation of a subproblem for primal 
recovery proceeds in two stages. The first stage develops necessary and sufficient 
optimality conditions for the primal-dual gauge pair (2.3a) and (2.3b). The second 
stage uses these to derive a subproblem that can be used to recover a primal solution 
from a dual solution. 

3.1.1. Strong duality and optimality conditions. The weak duality condi¬ 
tion (2.2) holds for all primal-feasible pairs (A', y). The following result asserts that if 
the pair is optimal, then that inequality must necessarily hold tightly. 

Proposition 3.1 (Strong duality). If (1.1a) is feasible and 0 < e < || 6 ||, then 


min trace A' + 5(X \ ■ >- 0) 


inf i [A 1 M*y)] + 

xeH" 

\\b—AX\\<e 


j/EM 

_(b,y)-e IMI*>1 


= 1 . 


(3.1) 
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Proof. We proceed by reasoning about the Lagrangian-dual pair (1.1a) and (1.6). 
We then translate these results to the corresponding gauge-dual pair (2.3a) and (2.3b). 

The primal problem (1.1a) is feasible by assumption. Because its Lagrange dual 
problem (1.6) admits strictly feasible points (e.g., y = 0), it follows from Rockafellar 
(1970, Theorems 28.2 and 28.4) that the primal problem attains its positive minimum 
value and that there is zero duality gap between the Lagrange-dual pair. 

Moreover, because the primal problem (1.1a) attains its positive minimum value 
for some A, and there is zero duality gap, there exists a sequence {yj} such that 
[A 1 (M* 2 / j )] + < 1 and (yj,b) — e||2/j ||* traceA. Because trace A' > 0, we can take a 
subsequence {Vj k } for which {yj k ,b) — e11j/y||* is uniformly bounded above zero. Define 
the sequence {y k } by y k := y jk {{y jk ,b} - e||^J|*) _1 . Then (y k ,b) - e||y fc ||* = 1 for 
all k, which is a feasible sequence for the gauge dual problem (2.3b). Weak gauge 
duality (2.2) and the definition of y k then implies that 

(traceA) -1 < ^(.4*2/*,)]+ < (( y jk ,b) - (||?/ y J|*) 4 \ (traceA) -1 . 

Multiply the series of inequalities by traceA' to obtain (3.1). □ 

Note the lack of symmetry in the statement of Proposition 3.1: the primal problem 
is stated with a “min”, but the dual problem is stated with an “inf”. This is because 
the dual Slater condition—i.e., strict feasibility of the corresponding Lagrange-dual 
problem (1.6)— allows us to assert that a primal optimal solution necessarily exists. 
However, we cannot assert in general that a dual optimal solution exists because the 
corresponding primal feasible set does not necessarily satisfy the Slater condition. 

Although in this work we do not attempt to delineate conditions under which dual 
attainment holds, a practical case in which it always does is when the primal objective 
is a norm and the measurement operator is surjective. In that case, the dual gauge 
objective ||_4* • ||* defines a norm in R m , which has compact level sets. Hence a dual 
solution always exists. We comment further on this theoretical question in section 7. 

The following result characterizes gauge primal-dual optimal pairs. It relies on 
von Neumann’s trace inequality: for Hermitian matrices A and B 1 

{A, B) < (\(A),\(B)), 

and equality holds if and only if A and B admit a simultaneous ordered eigendecompo- 
sition, i.e., A = t/Diag[A(A)][/* and B = U Diag[A(H)]C/* for some unitary matrix U; 
see Lewis (1996). 

Proposition 3.2 (Optimality conditions). If (1.1a) is feasible and 0 < e < ||6||, 
then (A', y) £ PL" x R m is primal-dual optimal for the gauge dual pair (2.3a) and 
(2.3b) if and only if the following conditions hold: 

1. X y 0 and ||6 — MA|| = e; 

2. (y , b) — ellylL = 1; 

5. {y,b-AX) = \\yU\b-AX\\- 

4- A,:(A') • (\i(A*y) - A i{A*y)) = 0, i = 1 ,..., n; 

5. X and A* y admit a simultaneous ordered eigendecomposition. 

Proof By strong duality (Proposition 3.1), the pair (A ,y) £ PL" 1 x M m is primal- 
dual optimal if and only if they are primal-dual feasible and the product of their 
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corresponding objective values is equal to one. In this case, 


1 = 


> 

> 

> 

> 

> 


[trace X + S(X |- y 0)] • [X^A* y)\_ 
M(A))-A M*V) 

(A M*y)-e,\(x)) 

(X(A*y),X(X)) 

(A*y,X) 

(v,AX) 

{y, b) - {y,b- AX) 
(y,b)-\\yUb-AX\\ 

(y,b) — e||y||* 

1 . 


(strong duality) 


(A 1 (Al*y) > X i (A*y) and X y 0) 
(von Neumann’s trace inequality) 


(Cauchy-Schwartz inequality) 
(primal feasibility) 
(dual feasibility) 


Thus all of the above inequalities hold with equality. This proves conditions 1-4. 
Condition 5 follows from again invoking von Neumann’s trace inequality and noting its 
implication that A' and A*y share a simultaneous ordered eigenvalue decomposition. 
Sufficiency of those conditions can be verified by simply following the reverse chain of 
reasoning and again noticing that the inequalities can be replaced by equalities. □ 

3.2. Primal recovery subproblem. The optimality conditions stated in Propo¬ 
sition 3.2 furnish the means for deriving a subproblem that can be used to recover a 
primal solution from a dual solution. The next result establishes an explicit relationship 
between primal solutions A and A* y for an arbitrary optimal dual solution y. 

Corollary 3.3. Suppose that the conditions of Proposition 3.2 hold. Let y £ M m 
be an arbitrary optimal solution for the dual gauge program (2.3b), ri £ {1,... ,n} 
be the multiplicity of X 1 (A*y), and Ui £ C" xri be the matrix formed by the first rq 
eigenvectors of A*y. Then a matrix X £ TL n is a solution for the primal problem (2.3a) 
if and only if there exists an r 1 x r 1 matrix S y 0 such that 


X = U^Uf and (b - AX) £ e<9|| • ||*(y). (3.2) 

Proof. The assumptions imply that the optimal dual value is positive. If y £ K m is 
an optimal solution to (2.3b), the positive-homogeneity of its objective and constraint, 
and the positivity of the optimal value, allow us to deduce that the dual constraint 
must be active, i.e., ( y , b) — e||y||* = 1. Thus condition 2 of Proposition 3.2 holds. 

The construction of A' in (3.2) guarantees that it shares a simultaneous ordered 
eigendecomposition with A*y , and that it has rank of r 1 at most. Thus, conditions 4 
and 5 of Proposition 3.2 hold. 

We now show that conditions 1 and 3 of the proposition hold. The subdifferential 
<9|| • ||* corresponds to the set of maximizers of the linear function that defines the 
dual ball; see (1.3). Then because ( b — AX) £ ed\\ ■ ||*(y), it holds that ||6 — AX\\ < e 
and e||y||* = {y,b~AX) < ||y||*||6 - *4A|| < e||y||*, implying that ||6-AlA|| =e and 
(y, b — AX) = ||y||*||& — *4Aj|. This way, condition 1 and 3 of Proposition 3.2 are 
also satisfied. Hence, all the conditions of the proposition are satisfied, and the pair 
(A', y) £ TL n x R m is optimal. 

Suppose now that X £ TL" is optimal for (2.3a). We can invoke Proposition 3.2 
on the pair (X,y) £ Ti" x R m . Condition 4 implies that any eigenvector of A*y 
associated to an eigenvalue A i(A*y) with i > r ± is in the nullspace of A, therefore 
there is an r 1 x r 1 matrix S A 0 such that X = U 1 SUi- Conditions 1 and 3 imply that 
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\\b — Ax’ll < e and (y, b — AX) = e||j/||*, thus verifying that ( b — AX) G ed|| • ||*(y), as 
required. □ 

Corollary 3.3 thus provides us with a way to recover a solution to our model 
problem (1.1a) after computing a solution to the gauge dual problem (2.3a). When 
the residual in (1.1a) is measured in the 2-norm, condition (3.2) simplifies, and implies 
that the matrix S that defines A' = USU* can be obtained by solving 

minimize \\A(U 1 SUi) ^ b e \\ 2 , with b e := b — ey/\\y\\. (3-3) 

When the multiplicity r 1 of the eigenvalue A \{A* y) is much smaller than n, this 
optimization problem is relatively inexpensive. In particular, if rq = 1 -which may be 
expected in some applications such as PhaseLift—the optimization problem is over a 
scalar s that can be obtained immediately as 

s=[(A{u 1 u\)X)} + /\\A{u l u\)\\ 2 

where iq is the rightmost eigenvalue of A* y. This approach exploits the complementar¬ 
ity relation on eigenvalues in condition 4 of Proposition 3.2 to reduce the dimensionality 
of the primal solution recovery. Its computational difficulty effectively depends on 
finding a dual solution y at which the rightmost eigenvalue has low multiplicity rq. 

4. Implementation. The success of our approach hinges on efficiently solving 
the constrained eigenvalue optimization problem (1.6) in order to generate solution 
estimates y and rightmost eigenvector estimates U-y of A*y that we can feed to (3.3). 
The two main properties of this problem that drive our approach are that it has a 
nonsmooth objective and that projections on the feasible set are inexpensive. Our 
implementation is based on a basic projected-subgradient descent method, although 
certainly other choices are available. For example, Nesterov (2009) and Richtarik (2011) 
propose specialized algorithms for minimizing positively homogeneous functions with 
affine constraints; some modification of this approach could possibly apply to (1.6). 
Another possible choice is Helmberg and Rendl’s (2000) spectral bundle method. 
For simplicity, and because it has proven sufficient for our needs, we use a standard 
projected subgradient method, described below. 

4.1. Dual descent. The generic subgradient method is based on the iteration 

y+ = V{y-ag), (4.1) 

where g is a subgradient of the objective at the current iterate y, a is a positive 
steplength, and the operator V : R m —> R m gives the Euclidean projection onto the 
feasible set. For the objective function f(y) = X 1 (A*y) of (1.5), the subdifferential 
has the form 


df(y) = {A(U 1 TUl)\Th0, traceT=l}, (4.2) 

where Ui is the n x rq matrix of rightmost eigenvectors of A*y (Overton, 1992, 
Theorem 3). A Krylov-based eigenvalue solver can be used to evaluate f{y) and 
a subgradient g G df{y). Such methods require products of the form (A*y)v for 
arbitrary vectors v. In many cases, these products can be computed without explicitly 
forming the matrix A*y. In particular, for the applications described in section 1.2, 
these products can be computed entirely using fast operators such as the FFT. Similar 
efficiencies can be used to compute a subgradient g from the forward map A(U]TUl). 
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For large problems, further efficiencies can be obtained simply by computing a single 
eigenvector iq, i.e., any unit-norm vector in the range of U 1 . In our implementation, 
we typically request at least two rightmost eigenpairs: this gives us an opportunity 
to detect if the leading eigenpair is isolated. If it is, then the subdifferential contains 
only a single element, which implies that / is differentiable at that point. 

Any sequence of step lengths {a k } that satisfies the generic conditions 

OO 

lim a k = 0, y a k = oo 

k—yoo ' 

is sufficient to guarantee that the value of the objective at y k converges to the optimal 
value (Bertsekas, 2015, Proposition 3.2.6). A typical choice is a k = 0(l/k). Our 
implementation defaults to a Barzilai-Borwein steplength (Barzilai and Borwein, 1988) 
with a nonmonotonic linesearch (Zhang and Hager, 2004) if it is detected that a 
sequence of iterates is differentiable (by observing separation of the leading eigenpair); 
and otherwise it falls back to a decreasing step size. 

The projection operator V onto the dual-feasible set (1.5) is inexpensive when the 
residual is measured in the 2-norm. In particular, if e = 0, the dual-feasible set is a 
halfspace, and the projection can be accomplished in linear time. When e is positive, 
the projection requires computing the roots of a 1-dimensional degree-4 polynomial, 
which in practice requires little additional time. 

4.2. Primal recovery. At each iteration of the descent method (4.1) for the 
eigenvalue optimization problem (1.5), we compute a corresponding primal estimate 

X + = U,S + Ut (4.3) 

maintained in factored form. The matrix U 1 has already been computed in the 
evaluation of the objective and its subgradient; see (4.2). The positive semidefinite 
matrix S + is the solution of the primal-recovery problem (3.3). 

A byproduct of the primal-recovery problem is that it provides a suitable stopping 
criterion for the overall algorithm. Because the iterations y k are dual feasible, it follows 
from Corollary 3.3 that if (3.3) has a zero residual, then the dual iterate y k and the 
corresponding primal iterate X k are optimal. Thus, we use the size of the residual to 
determine a stopping test for approximate optimality. 

4.3. Primal-dual refinement. The primal-recovery procedure outlined in sec¬ 
tion 4.2 is used only as a stopping criterion, and does not directly affect the sequence 
of dual iterates from (4.1). In our numerical experiments, we find that significant gains 
can be had by refining the primal estimate (4.3) and feeding it back into the dual 
sequence. We use the following procedure, which involves two auxiliary subproblems 
that add relatively little to the overall cost. 

The first step is to refine the primal estimate obtained via (3.3) by using its 
solution to determine the starting point Z 0 = U-\ S+ for the smooth unconstrained 
non-convex problem 

minimize h(Z) := ||| A(ZZ*) — b e || 2 . (4-4) 

zeC rlXr 

In effect, we continue to minimize (3.3), where additionally is allowed to vary. 
Several options are available for solving this smooth unconstrained problem. Our 
implementation has the option of using a steepest-descent iteration with a spectral 
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steplength and non-monotone linesearch (Zhang and Hager, 2004), or a limited-memory 
BFGS method (Nocedal and Wright, 2006, Section 7.2). The main cost at each iteration 
is the evaluation of the gradient 

Xh(Z) = A*{A{ZZ*)-b t )Z. (4.5) 


We thus obtain a candidate improved primal estimate X = ZZ *, where Z is a solution 
of (4.4). When e = 0, this non-convex problem coincides with the problem used by 
Candes, Li, and Soltanolkotabi (2015). They use the initialization Z 0 = 7 %, where u 1 
is a leading eigenvector of A*b , and 7 = n JT 6j/ )Tb || a zl| 2 - Our initialization, on the 
other hand, is based on a solution of the primal-recovery problem (3.3). 

The second step of the refinement procedure is to construct a candidate dual 
estimate y from a solution of the constrained linear-least-squares problem 

minimize b\\(A*y)Z — \Z\\ 2 subject to ( 6 , y) — e||y||» > 1, (4.6) 

yes™ 

where A := 1 /traceX = 1/||Z||^ is the reciprocal of the primal objective value 
associated with X. This constrained linear-least-squares problem attempts to construct 
a vector y such that the columns of Z correspond to eigenvectors of A*y associated 
with A. If f(y) < f(y+), then y improves on the current dual iterate y + obtained by 
the descent method (4.1), and we are free to use y in its place. This improved estimate, 
which is exogenous to the dual descent method, can be considered a “spacer” iterate, 
as described by Bertsekas (1999, Proposition 1.2.6). Importantly, it does not interfere 
with the convergence of the underlying descent method. The projected-descent method 
used to solve the dual sequence can also be applied to (4.6), though in this case the 
objective is guaranteed to be differentiable. 

4.4. Algorithm summary. The following steps summarize one iteration of the 
dual-descent algorithm: y is the current dual iterate, and y + is the updated iterate. 
The primal iterate X is maintained in factored form. Steps 5-7 implement the primal- 
dual refinement strategy described in section 4.3, and constitute a heuristic that may 
improve the performance of the dual descent algorithm without sacrificing convergence 
guarantees. 


1 ^i(Al y) 

[ eigenvalue computation ] 

2 g <- A{U,TUt) 

[gradient of dual objective; cf. (4.2)] 

s / t- V{y - ag) 

[projected subgradient step; cf. (4.1)] 

4 S + «— solution of (3.3) 

[primal recovery subproblem] 

5 Z <r- solution of (4.4) initialized with S + [ primal refinement ] 

6 y <r- solution of (4.6) initialized with Z [dual refinement] 

if A i(A*y) < Xi then 


7 y + «- y 

[spacer step] 

end 



In Step 1, the rightmost eigenpair (A i,Ui) of A*y is computed. The eigenvectors in 
the matrix U\ are used in Step 2 to compute a subgradient g for the dual objective. 
Any PSD matrix T that has trace equal to 1 can be used in Step 2. For example, 
the case where only a single rightmost eigenvector u 1 can be afforded corresponds to 
setting T so that U 1 TU 1 = rqwj. Step 3 is a projected subgradient iteration with 
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steplength a. Step 4 solves the primal-recovery problem to determine the matrix S + 
used to define a primal estimate X + = UiS + U{\ cf. (3.2). We use the factorization 
Z 0 := U 1 S_(_ to initialize the algorithm in the next step. Step 5 applies an algorithm 
to the nonlinear least-squares problem (4.4) to obtain a stationary point Z used to 
define the dual-refinement problem used in the next step. Step 6 computes a candidate 
dual solution y that- if it improves the dual objective—is used to replace the latest 
dual estimate y + . 

5. Numerical experiments. This section reports on a set of numerical experi¬ 
ments for solving instances of the phase retrieval and blind deconvolution problems 
described in section 1.2. The various algorithmic pieces described in section 4 have 
been implemented as a Matlab software package. The implementation uses Matlab’s 
eigs routine for the eigenvalue computations described in section 4.1. We implemented 
a projected gradient-descent method, which is used for solving (4.1), (4.4), and (4.6). 

5.1. Phase recovery. We conduct three experiments for phase retrieval via the 
PhaseLift formulation. The first experiment is for a large collection of small one- 
dimensional random signals, and is meant to contrast the approach against a general- 
purpose convex optimization algorithm and a specialized non-convex approach. The 
second experiment tests problems where the vector of observations b is contaminated 
by noise, hence testing the case where e > 0. The third experiment tests the scalability 
of the approach on a large two-dimensional natural image. 

Our phase retrieval experiments follow the approach outlined in Candes et al. 
(2015). The diagonal matrices C k £ C" n encode diffraction patterns that correspond 
to the fcth “mask” (k = 1,..., L) through which a signal x 0 £ C n is measured. The 
measurements are given by 


b = A(x 0 Xq) := diag 



where F is the unitary discrete Fourier transform (DFT). The adjoint of the associated 
linear map A is then 


A*y:=J2 C * F * DiagO/ fe )FC fe , 
k-1 

where y = (yi ,..., y^) and Diag (y k ) is the diagonal matrix formed from the vector y k . 
The main cost in the evaluation of the forward map A(VV ) involves L applications 
of the DFT for each column of V. Each evaluation of the adjoint map applied to a 
vector v —i.e., ( A*y)v —requires L applications of both the DFT and its inverse. In the 
experimental results reported below, the columns labeled “nDFT” indicate the total 
number of DFT evaluations used over the course of a run. The costs of these DFT 
evaluations are invariant across the different algorithms, and dominate the overall 
computation. 

5.1.1. Random Gaussian signals. In this section we consider a set of experi¬ 
ments for different numbers of masks. For each value of L = 6 ,7,..., 12, we generate 
a fixed set of 100 random complex Gaussian vectors x 0 of length n = 128, and a set 
of L random complex Gaussian masks C k . 
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Table 5.1 

Phase retrieval comparisons for random complex Gaussian signals of size n = 128 measured using 


random complex Gaussian masks. 

Numbers of the form n_ e are g 

i shorthand for n 

■ 10 e . 


L 

GAUGE 

GAUGE- 

-f eas 

TFOCS 



WFLOW 


nDFT 

xErr 

nDFT 

xErr 

nDFT 

xErr 

% 

nDFT 

xErr 

% 

12 

18,330 

1 - 6-6 

3,528 

1-3-6 

2,341,800 

3.6-3 

100 

5,232 

1-2-5 

100 

11 

19,256 

1-5-6 

3,344 

1-4-6 

2,427,546 

4.3-3 

100 

4,906 

1.6-5 

100 

10 

19,045 

1.4-6 

3,120 

1 . 6-6 

2,857,650 

5.5_3 

100 

4,620 

2.1-5 

100 

9 

21,933 

1 - 6-6 

2,889 

1-4-6 

1.2 ■ 10 7 

CO 

1 

ic 

89 

4,374 

2.5-s 

100 

8 

23,144 

2 . 1-6 

2,688 

1.9 _ 6 

1.1 ■ 10 7 

1 . 2-2 

22 

4,080 

3.3-5 

100 

7 

25,781 

1 - 8-6 

2,492 

2 . 0 -e 

6,853,245 

2.4 - 2 

0 

3,836 

5.2-s 

95 

6 

34,689 

3.0-e 

2,424 

2.5 - 6 

2,664,126 

6.4-2 

0 

3,954 

9.5-s 

62 


Table 5.2 

Additional comparisons for the random examples of Table 5.1. 


L 

GAUGE 

GAUGE-nodfp 

nDFT 

xErr 

nDFT 

xErr 

12 

18,330 

1 - 6-6 

277,722 

1 - 6-6 

11 

19,256 

1-5-6 

314,820 

1 - 6-6 

10 

19,045 

1-4-6 

374,190 

2 . 0-6 

9 

21,933 

1 . 6-6 

485,658 

1-9-6 

8 

23,144 

2 . 1-6 

808,792 

1.9-6 

7 

25,781 

1 . 8_ 6 

2,236,885 

2.3-6 

6 

34,689 

CD 

1 

O 

CO 

14,368,437 

2.9-e 


Table 5.1 summarizes the results of applying four different solvers to each set of 
100 problems. The solver GAUGE is our implementation of the approach summarized 
in section 4.4; TFOCS (Becker, Candes, and Grant, 2011) is a first-order conic solver 
applied to the primal problem (1.1a). The version used here was modified to avoid 
explicitly forming the matrix A*y (Strohmer, 2013). The algorithm WFLOW (Candes 
et ah, 2015) is a non-convex approach that attempts to recover the original signal 
directly from the feasibility problem (4.4), with e = 0. To make sensible performance 
comparisons to WFLOW, we add to its implementation a stopping test based on the 
norm of the gradient (4.5); the default algorithm otherwise uses a fixed number of 
iterations. 

We also show the results of applying the GAUGE code in a “feasibility” mode 
that exits as soon as the primal-refinment subproblem (see Step 7 of the algorithm 
summary in section 4.4) obtains a solution with a small residual. This resulting solver 
is labeled GAUGE-feas. This variant of GAUGE is in some respects akin to WFLOW, with 
the main difference that GAUGE-feas uses starting points generated by the dual-descent 
estimates, and generates search directions and step-lengths for the feasibility problem 
from a spectral gradient algorithm. The columns labeled “xErr” report the median 
relative error ||x 0 a:o — xx* Hf/H^oll^ of the 100 runs, where £ is the solution returned 
by the corresponding solver. The columns labeled “%” give the percentage of problems 
solved to within a relative error of 10~ 2 . At least on this set of artificial experiments, 
the GAUGE solver (and its feasibility variant GAUGE-feas) appear to be most efficient. 
Table 5.2 provides an additional comparison of GAUGE with the variation GAUGE-nodfp, 
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which ignores the “spacer” iterate computed by (4.6). There seems to be significant 
practical benefit in using the refined primal estimate to improve the dual sequence. 
The columns labeled “%” are excluded for all versions of GAUGE because these solvers 
obtained the prescribed accuracy for all problems in each test set. 

Note that the relative accuracy “xErr” is often slightly better for GAUGE-feas 
than for GAUGE. These small small discrepancies are explained by the different stopping 
criteria between the two versions of the solver. In particular, GAUGE will continue 
iterating past the point at which GAUGE-feas normally terminates because it is 
searching for a dual certificate that corresponds to the recovered primal estimate. 
This slightly changes the computed subspaces XJ \, which influence subsequent primal 
estimates. Similar behaviour is exhbited in the noisy cases that we consider in the 
next section. 

As the number of measurements (L) decreases, we expect the problem to be more 
difficult. Indeed, we can observe that the total amount of work, as measured by 
the number of operator evaluations (i.e., the ratio between nDFT and L), increases 
monotonically for all variations of GAUGE. 

5.1.2. Random problems with noise. In this set of experiments, we assess 
the effectiveness of the SDP solver to problems with e > 0, which could be useful in 
recovering signals with noise. For this purpose, it is convenient to generate problems 
instances with noise and known primal-dual solutions, which we can do by using 
Corollary 3.3. Each instance is generated by first sampling octanary masks C k —as 
described by Candes et al. (2015)— and real Gaussian vectors y £ R m ; a solution x 0 £ 
C” is then chosen as a unit-norm rightmost eigenvector of A y , and the measurements 
are computed as b := A(x 0 Xo)+ey/\\y\\, where e is chosen as e := ||6—^4(a: 0 ^o)II = ulHI> 
for a given noise-level parameter y £ (0,1). 

For these experiments, we generate 100 instances with n = 128 for each pairwise 
combination (L,y) with L £ {6,9,12} and y £ {0.1%,0.5%, 1%,5%, 10%, 50%}. Ta¬ 
ble 5.3 summarizes the results of applying the three variations of GAUGE, and the WFLOW 
solver, to these problems. It is not clear that GAUGE-feas and WFLOW are relevant 
for this experiment, but for interest we include them in the results. As with the 
experiments in section 5.1.1, a solve is “successful” if it recovers the true solution with 
a relative error of 10 2 . The median relative error for all solvers is comparable, and 
hence we omit the column “xErr”. GAUGE-nodfp is generally successful in recovering 
the rank-1 minimizer for most problems—even for cases with significant noise, though 
in these cases the overall cost increases considerably. On the other hand, GAUGE is less 
successful: it appears that although the primal-dual refinement procedure can help to 
reduce the cost of successful recovery in low-noise settings, in high-noise settings it 
may obtain primal solutions that are not necessarily close to the true signal. For noise 
levels over 5%, GAUGE-feas and WFLOW are unable to recover a solution within the 
prescribed accuracy, which points to the benefits of the additional cost of obtaining a 
primal-dual optimal point, rather than just a primal feasible point. 

5.1.3. Two-dimensional signal. We conduct a second experiment on a stylized 
application in order to assess the scalability of the approach to larger problem sizes. 
In this case the measured signal x 0 is a two-dimensional image of size 1600 x 1350 
pixels, shown in Figure 5.1, which corresponds to n = 2.2 • 10 6 . The size of the lifted 
formulation is on the order of n 2 ~ 10 12 , which makes it clear that the resulting SDP 
is enormous, and must be handled by a specialized solver. We have excluded TFOCS 
from the list of candidate solvers because it cannot make progress on this example. 
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Table 5.3 

Phase retrieval comparisons for problems with noise, i.e., e > 0. Numbers of the form n_ e are a 
shorthand for n ■ 10 . 


L 

V 

GAUGE 

GAUGE-nodfp 

GAUGE- 

:eas 

WFLOW 

nDFT 

% 

nDFT 

% 

nDFT 

% 

nDFT 

% 

12 

0 .1% 

4,584 

100 

29 

988 

100 

936 

100 

14,856 

100 

9 

0 .1% 

3,222 

100 

36 

292 

100 

774 

100 

11,511 

100 

6 

0 .1% 

2,232 

100 

50 

235 

100 

612 

100 

8,922 

98 

12 

0.5% 

3,768 

100 

27 

252 

100 

936 

100 

14,808 

100 

9 

0.5% 

2,934 

100 

31 

032 

100 

774 

100 

11,430 

100 

6 

0.5% 

2,148 

100 

38 

766 

100 

606 

100 

8,790 

98 

12 

1 .0% 

3,744 

100 

22 

620 

97 

936 

100 

14,712 

100 

9 

1 .0% 

2,934 

100 

24 

813 

96 

774 

100 

11,331 

99 

6 

1 .0% 

1 • 10 7 

97 

2 • 

10 5 

98 

600 

53 

8,634 

8 

12 

5.0% 

95,952 

26 

9- 

10 5 

90 

936 

0 

14,148 

0 

9 

5.0% 

2 • 10 6 

89 

8 - 

10 5 

90 

774 

0 

10,701 

0 

6 

5.0% 

2 • 10 6 

82 

5- 

10 5 

98 

600 

0 

7,728 

0 

12 

10 .0% 

1 • 10 5 

17 

1 ■ 

10 6 

89 

912 

0 

13,548 

0 

9 

10 .0% 

8 • 10 5 

78 

7- 

10 5 

91 

765 

0 

10,125 

0 

6 

10 .0% 

7 • 10 5 

90 

5- 

10 5 

100 

588 

0 

7,098 

0 

12 

50.0% 

2 • 10 5 

53 

5- 

10 5 

94 

888 

0 

11,424 

0 

9 

50.0% 

1 ■ 10 5 

42 

3- 

10 5 

99 

738 

0 

8,586 

0 

6 

50.0% 

1 • 10 5 

24 

2 ■ 

10 5 

95 

588 

0 

7,176 

0 



Fig. 5.1. Image used for phase retrieval experiment; size 1600 X 1350 pixels (7.5MB). 


We generate 10 and 15 octanary masks. Table 5.4 summarizes the results. The column 
headers carry the same meaning as Table 5.1. 


5.2. Blind deconvolution. In this blind deconvolution experiment, the con¬ 
volution of two signals s x £ C m and s 2 £ C m are measured. Let B x £ C mxri1 and 
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Table 5.4 

Phase retrieval comparisons on a two-dimensional image. 



GAUGE 

GAUGE-feas 

WFLOW 

L 

nDFT 

xErr 

nDFT 

xErr 

nDFT 

xErr 

15 

200,835 

2.1 ■ 10 -6 

5,700 

2.1 • 10" 6 

8,100 

4.1 ■ 10“ 6 

10 

195,210 

5.8- 10” 7 

12,280 

9.1 ■ 10 -7 

12,340 

2.1 ■ 10“ 5 


Table 5.5 

Blind deconvolution comparisons. 


solver 

nDFT 

nDWT 

xErrl 

xErr2 

rErr 

aug Lagrangian 

92,224 

76,872 

7.9- 10“ 2 

5.0- 10 _1 

1.4-10~ 4 

GAUGE 

17,432 

8,374 

1.9- 10 -2 

5.4- 10 _1 

3.8- 10” 4 

GAUGE-feas 

4,128 

2,062 

8.4- 10“ 2 

5.5- 10 _1 

4.0- 10“ 4 


B 2 £ c mx " 2 be two bases. The circular convolution of the signals can be described by 

b = * s 2 = {Bi'Xi) * ( B 2 x 2 ) 

= F” 1 diag {{FB 1 x 1 )(F B 2 x 2 ) T ) 

= F~ l diag {{FB 1 )(x 1 x* 2 ){FB 2 )*) =: 

where A is the corresponding asymmetric linear map with the adjoint 

A*y := (FF 1 )*Diag(Fy)(FB 2 ). 

Because F is unitary, it is possible to work instead with measurements 
b = Fb = diag ({FB 1 )(x 1 x* 2 ){FB 2 )*) 

in the Fourier domain. For the experiments that we run, we choose to work with the 
former real-valued measurements b because they do not require accounting for the 
imaginary parts, and thus the number of constraints in (1.4) that would be required 
otherwise is reduced by half. 

We follow the experimental setup outlined by Ahmed et al. (2014) and use the 
data and code that they provide. In that setup, B 1 is a subset of the Haar wavelet 
basis, and B 2 is a mask corresponding to a subset of the identity basis. The top 
row of Figure 5.2 shows the original image, the blurring kernel, and the observed 
blurred image. The second row of the figure shows the image reconstructed using the 
augmented Lagrangian code provided by Ahmed et al. (2014), GAUGE, and GAUGE-feas. 
Table 5.5 summarizes the results of applying the three solvers. The columns headed 
“nDFT” and “nDWT” count the number of discrete Fourier and wavelet transforms 
required by each solver; the columns headed “xErrl” and “xErr2” report the relative 
errors ||ajj — x t l^/ll^ilh) * = 1,2, where ir, are the recovered solutions; the column 
headed “rErr” reports the relative residual error \\b — ^(aqa^lh/IHh- Although the 
non-convex augmented Lagrangian approach yields visibly better recovery of the image, 
the table reveals that the solutions are numerically similar, and are recovered with far 
less work. 
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(a) 


(b) 




(d) (e) (f) 


Fig. 5.2. Images used for the blind deconvolution experiments: (a) original image; (b) zoom of 
the motion-blurring kernel; (c) blurred image; (d) image recovered by the augmented Lagrangian 
approach; (e) image recovered by GAUGE; (f) image recovered by GAUGE-feas. The shaded background 
on the recovered images is an artifact of the uniform scaling used to highlight any error between the 
original and recovered signals. 


6. Extensions. The problems (1.1) that we have considered so far are stated 
in their simplest form. General semidefmite optimization problems with nonnegative 
optimal value, and reweighted formulations for rank minimization, as introduced by 
Mohan and Fazel (2010) and Candes, Eldar, Strohmer, and Voroninski (2013), are 
also useful and can be accommodated by our approach. 

In the context of rank minimization over the PSD cone, an approximate minimum- 
rank solution A (e.g., computed via trace minimization) might be used to obtain 
an even better approximation by using the weighted objective (C, A'), where C := 
(SI + A) -1 and S is a small positive parameter. We might reasonably expect that 
the objective value at a minimizer of this objective more closely matches the rank 
function. Candes et al. (2013) show that such an iteratively reweighted sequence 
of trace minimization problems can improve the range of signals recoverable using 
PhaseLift. Each problem in that sequence uses the previous solution X to derive a 
weighting matrix C = (SI + ZZ *)for the next problem. The inverse of the matrix C 
is a low-rank update ZZ* ss X to a small regularizing multiple S of the identity matrix. 
This idea generalizes that of reweighting the 1-norm for cardinality minimization 
problems in compressed sensing, where the number of nonzero entries of a vector x is 
approximated by f° r small S and an available approximation x (e.g., 

computed via 1-norm minimization). 

In the next sections we derive the corresponding gauge duals for the weighted for- 
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mulations of both trace minimization in the PSD cone and nuclear-norm minimization. 

6.1. Nonnegative semidefinite optimization. Consider the semidefinite op¬ 
timization problem 

minimize (C,X) subject to 116 — AXj| < e, X ^ 0, (6.1) 

where C >- 0. Dehne the maps 

C(-):=C~i(-)Crt and A c :=AoC~\ 

It is evident that X >; 0 if and only if C(X) >; 0, and so the SDP problem can be 
stated equivalently as 

minimize trace C(X) subject to \\b — Ac(C(X))\\ < e, C(X) >z 0. 

Because C is a bijection, we can optimize over X := C(X) instead of X: 

minimize trace X subject to ||& — .4 C A|| < e, X >: 0. (6.2) 

x ew" 

This clearly falls within the structure of (1.1a), and has the corresponding gauge dual 

minimize [A^Apy)^ subject to (b, y) - e||y||* > 1. (6.3) 

ye ® 

Observe that X 1 (Acy) = A 1 (C _5 (A*y)C~ 2 ) = Xi(A*y,C). Then 

minimize [X 1 (A*y, C)] + subject to (6, y) — e||y||* > 1. (6.4) 

yes™ 

This shows that the introduction of a weighting matrix C that is not a simple 
multiple of the identity leads to a dual gauge problem involving the minimization of 
the rightmost generalized eigenvalue of A*y with respect to that weight. Now that 
we have a formulation for the gauge dual problem, we focus on how a primal solution 
to the original weighted trace minimization can be computed given a dual minimizer. 
This extends Corollary 3.3. 

Corollary 6.1. Suppose that problem (6.1) is feasible and 0 < e < ||6||. Let 
y £ R m be an arbitrary optimal solution for the dual gauge (6.4), rq £ {1,... ,n) be 
the multiplicity of X 1 (A*y,C), and U l £ C" ri be the matrix formed by the first r 1 
generalized eigenvectors of A*y with respect to C. Then X £ TL n is a solution for the 
primal problem (6.1) if and only if there exists S >: 0 such that 

X = U±SU{ and ( b - AX) £ e<9|| ■ ||*(y). 

Proof. A solution for (6.4) is clearly a solution for (6.3). We may thus invoke 
Corollary 3.3 and assert that X £ Tl n is a solution for (6.2) if and only if there is 
S h 0 such that X = U^SUf and (6 — A C X) £ e<9|| • ||*(y), where U 1 £ C nxri is 
a matrix formed by the first rq eigenvectors of Acy = C 2 (A y)C 2 . From the 
structure of C, we have that A is a solution to (6.1) if and only if X = C(X). Thus, 
X = C~ 2 U 1 SUfC~ 2 = U 1 SUf, where U 1 := C~ 2 U 1 corresponds to the first rq 
generalized eigenvectors of A*y with respect to C. □ 

Once again, this provides us with a way to recover a solution to the weighted 
trace minimization problem by computing a solution to the gauge dual problem 
(now involving the rightmost generalized eigenvalue) and then solving a problem of 
potentially much reduced dimensionality. 
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6 .2. Weighted affine nuclear-norm optimization. We can similarly extend 
the reweighted extension to the asymmetric case (1.1b). Let C 1 £ 'H rtl and C 2 € 'H” 2 
be invertible. The weighted nuclear-norm minimization problem becomes 

minimize HC^AC^ ||i subject to \\b — AX\\ < e. (6.5) 

xec " 1 *” 2 


Define the weighted quantities 

C(-) = : C niX " 2 —>■ C niX " 2 , A c = AoC, and A' := C( A). 


The weighted problem can then be stated equivalently as 

minimize ||A|| x subject to \\b—AcX\\<e, 

X6C“i x “2 


which, following the approach introduced in Fazel (2002), can be embedded in a 
symmetric problem: 


minimize 

UEH" 1 

veU™ 2 

xec " 1>! " 2 




subject to 




and 


\\b-A c X\\<e. 


( 6 . 6 ) 


Define the measurement operator from M : Ff ' !l + n 2 c m by the map 




1 t AqX , 


and identify C m with M 2m as a real inner-product space. The adjoint of the measure¬ 
ment operator is then given by 


M*y 


( 0 A* c y\ 

\{AcvT 0 ) 


where A*cV = \ YYL i Ci 1 A i C 2 *Vi- We can now state the gauge dual problem: 

minimize \\AM*y, hI)] . subject to 91(6, y) — e||?/|L > 1. (6.7) 


Observe the identity 


A 1 (M*y, 1 2 l)= Xi(2M*y) 


x , 0 YZiCI l A l c 2 *y l 

11 (E™i c^A.crviT o 


j + 


WCY(A*y)C 2 


= || cY(A*y)C 2 


We can now deduce the simplified form for the gauge dual problem: 

minimize ||C'r 1 (yl*t/)C' 2 "*|| 00 subject to 91(6, y) - e\\y\\* > 1. 

•yeC 


( 6 . 8 ) 
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This weighted gauge dual problem can be derived from first principles using the 
tools from section 2 by observing that the primal problem is already in standard 
gauge form. We chose this approach, however, to make explicit the close connection 
between the (weighted) nuclear-norm minimization problem and the (weighted) trace- 
minimization problem described in section 6.1. 

The following result provides a way to characterize solutions of the nuclear norm 
minimization problem when a solution to the dual gauge problem is available. 

Corollary 6.2. Suppose that problem (6.5) is feasible and 0 < e < ||6||. Let y £ 
C m be an arbitrary optimal solution for the dual gauge problem (6.8), r 1 £ {1, ...,n} 
be the multiplicity of a 1 (Cf 1 (A*y)Cf*), Ui £ C" lXri andVi £ C n2Xri be the matrices 
formed by the first ri left and right singular-vectors of Ci 1 (A*y)C 2 *, respectively. 
Then X £ C niX " 2 is a solution for the primal problem (6.5) if and only if there exists 
Sh 0 such that X = (C , r 1 t/ 1 )5'(C , 2 _ and (b - AX) £ e<9|| • \\,(y). 

Proof. A solution for (6.8) is clearly a solution for (6.7); this way we invoke 
Corollary 3.3 and have that ([/, V, X) £ H" 1 x 'H ri2 x C niXn2 induce a solution for (6.6) 
if and only if there is S >z 0 such that X = U 1 SVf and (b — AqX) £ ed|| • ||*(y), 
where U 1 £ C UlXri and V 1 £ C" 2Xri are matrices formed by the first rq left and right 
singular-vectors of A*cy = Cf 1 (A* y)C 2 *. From the structure of C, we have that X is 
a solution to (6.5) if and only if X = C(X). This way, X = (Cf 1 U 1 )S(C 2 1 V 1 )*. □ 

7. Conclusions. The phase retrieval and blind deconvolution applications are 
examples of convex relaxations of non-convex problems that give rise to large spectral 
optimization problems with strong statistical guarantees for correctly reconstructing 
certain signals. One of the criticisms that have been leveled at these relaxation 
approaches is that they lead to problems that are too difficult to be useful in practice. 
This has led to work on non-convex recovery algorithms that may not have as-strong 
statistical recovery guarantees, but are nonetheless effective in practice; Netrapalli, 
Jain, and Sanghavi (2013); Candes et al. (2015); White, Sanghavi, and Ward (2015). 
Our motivation is to determine whether it is possible to develop convex optimization 
algorithms that are as efficient as non-convex approaches. The numerical experiments 
on these problems suggest that the gauge-dual approach may prove effective. Indeed, 
other convex optimization algorithms may be possible, and clearly the key to their 
success will be to leverage the special structure of these problems. 

A theoretical question we have not addressed is to delineate conditions under 
which dual attainment will hold. In particular, the conclusion (3.1) of Theorem 3.1 is 
asymmetric: we can assert that a primal solution exists that attains the primal optimal 
value (because the Lagrange dual is strictly feasible), but we cannot assert that a dual 
solution exists that attains the dual optimal value. A related theoretical question is to 
understand the relationship between the quality of suboptimal dual solutions, and the 
quality of the primal estimate obtained by the primal recovery procedure. 

In our experiments, we have observed that the rightmost eigenvalue of A*y remains 
fairly well separated from the others across iterations. This seems to contribute to the 
overall effectiveness of the dual-descent method. Is there a special property of these 
problems or of the algorithm that encourages this separation property? It seems likely 
that there are solutions y at which the objective is not differentiable, and in that case, 
we wonder if there are algorithmic devices that could be used to avoid such points. 

The dual-descent method that we use to solve the dual subproblem (cf. section 4.1) 
is only one possible algorithm among many. Other more specialized methods, such 
as the spectral bundle method of Helmberg and Rendl (2000), its second-order vari¬ 
ant (Helmberg, Overton, and Rendl, 2014), or the stochastic-gradient method of 
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d’Aspremont and Karoui (2014), may prove effective alternatives. 

We have found it convenient to embed the nuclear-norm minimization prob¬ 
lem (1.1b) in the SDP formulation (1.1a) because it allows us to use the same solver 
for both problems. Further efficiencies, however, may be gained by implementing a 
solver that applied directly to the corresponding gauge dual 

minimize subject to y) - e||y||* > 1. 

yeC m 

This would require an iterative solver for evaluating leading singular values and singular 
vectors of the asymmetric operator A*y, such as PROPACK (Larsen, 2001). 
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